We present a new AI task -Embodied Question Answering (EmbodiedQA) -where an agent is spawned at a random location in a 3D environment and asked a question ('What color is the car?'). In order to answer, the agent must first intelligently navigate to explore the environment, gather information through first-person (egocentric) vision, and then answer the question ('orange'). This challenging task requires a range of AI skills -active perception, language understanding, goal-driven navigation, commonsense reasoning, and grounding of language into actions. In this work, we develop the environments, end-to-end-trained reinforcement learning agents, and evaluation protocols for EmbodiedQA.
translated by 谷歌翻译
Due to the high activation sparsity and use of accumulates (AC) instead of expensive multiply-and-accumulates (MAC), neuromorphic spiking neural networks (SNNs) have emerged as a promising low-power alternative to traditional DNNs for several computer vision (CV) applications. However, most existing SNNs require multiple time steps for acceptable inference accuracy, hindering real-time deployment and increasing spiking activity and, consequently, energy consumption. Recent works proposed direct encoding that directly feeds the analog pixel values in the first layer of the SNN in order to significantly reduce the number of time steps. Although the overhead for the first layer MACs with direct encoding is negligible for deep SNNs and the CV processing is efficient using SNNs, the data transfer between the image sensors and the downstream processing costs significant bandwidth and may dominate the total energy. To mitigate this concern, we propose an in-sensor computing hardware-software co-design framework for SNNs targeting image recognition tasks. Our approach reduces the bandwidth between sensing and processing by 12-96x and the resulting total energy by 2.32x compared to traditional CV processing, with a 3.8% reduction in accuracy on ImageNet.
translated by 谷歌翻译
Spiking Neural networks (SNN) have emerged as an attractive spatio-temporal computing paradigm for a wide range of low-power vision tasks. However, state-of-the-art (SOTA) SNN models either incur multiple time steps which hinder their deployment in real-time use cases or increase the training complexity significantly. To mitigate this concern, we present a training framework (from scratch) for one-time-step SNNs that uses a novel variant of the recently proposed Hoyer regularizer. We estimate the threshold of each SNN layer as the Hoyer extremum of a clipped version of its activation map, where the clipping threshold is trained using gradient descent with our Hoyer regularizer. This approach not only downscales the value of the trainable threshold, thereby emitting a large number of spikes for weight update with a limited number of iterations (due to only one time step) but also shifts the membrane potential values away from the threshold, thereby mitigating the effect of noise that can degrade the SNN accuracy. Our approach outperforms existing spiking, binary, and adder neural networks in terms of the accuracy-FLOPs trade-off for complex image recognition tasks. Downstream experiments on object detection also demonstrate the efficacy of our approach.
translated by 谷歌翻译
Solute transport in porous media is relevant to a wide range of applications in hydrogeology, geothermal energy, underground CO2 storage, and a variety of chemical engineering systems. Due to the complexity of solute transport in heterogeneous porous media, traditional solvers require high resolution meshing and are therefore expensive computationally. This study explores the application of a mesh-free method based on deep learning to accelerate the simulation of solute transport. We employ Physics-informed Neural Networks (PiNN) to solve solute transport problems in homogeneous and heterogeneous porous media governed by the advection-dispersion equation. Unlike traditional neural networks that learn from large training datasets, PiNNs only leverage the strong form mathematical models to simultaneously solve for multiple dependent or independent field variables (e.g., pressure and solute concentration fields). In this study, we construct PiNN using a periodic activation function to better represent the complex physical signals (i.e., pressure) and their derivatives (i.e., velocity). Several case studies are designed with the intention of investigating the proposed PiNN's capability to handle different degrees of complexity. A manual hyperparameter tuning method is used to find the best PiNN architecture for each test case. Point-wise error and mean square error (MSE) measures are employed to assess the performance of PiNNs' predictions against the ground truth solutions obtained analytically or numerically using the finite element method. Our findings show that the predictions of PiNN are in good agreement with the ground truth solutions while reducing computational complexity and cost by, at least, three orders of magnitude.
translated by 谷歌翻译
Motivated by mitigating potentially harmful impacts of technologies, the AI community has formulated and accepted mathematical definitions for certain pillars of accountability: e.g. privacy, fairness, and model transparency. Yet, we argue this is fundamentally misguided because these definitions are imperfect, siloed constructions of the human values they hope to proxy, while giving the guise that those values are sufficiently embedded in our technologies. Under popularized methods, tensions arise when practitioners attempt to achieve each pillar of fairness, privacy, and transparency in isolation or simultaneously. In this position paper, we push for redirection. We argue that the AI community needs to consider all the consequences of choosing certain formulations of these pillars -- not just the technical incompatibilities, but also the effects within the context of deployment. We point towards sociotechnical research for frameworks for the latter, but push for broader efforts into implementing these in practice.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
Seizure type identification is essential for the treatment and management of epileptic patients. However, it is a difficult process known to be time consuming and labor intensive. Automated diagnosis systems, with the advancement of machine learning algorithms, have the potential to accelerate the classification process, alert patients, and support physicians in making quick and accurate decisions. In this paper, we present a novel multi-path seizure-type classification deep learning network (MP-SeizNet), consisting of a convolutional neural network (CNN) and a bidirectional long short-term memory neural network (Bi-LSTM) with an attention mechanism. The objective of this study was to classify specific types of seizures, including complex partial, simple partial, absence, tonic, and tonic-clonic seizures, using only electroencephalogram (EEG) data. The EEG data is fed to our proposed model in two different representations. The CNN was fed with wavelet-based features extracted from the EEG signals, while the Bi-LSTM was fed with raw EEG signals to let our MP-SeizNet jointly learns from different representations of seizure data for more accurate information learning. The proposed MP-SeizNet was evaluated using the largest available EEG epilepsy database, the Temple University Hospital EEG Seizure Corpus, TUSZ v1.5.2. We evaluated our proposed model across different patient data using three-fold cross-validation and across seizure data using five-fold cross-validation, achieving F1 scores of 87.6% and 98.1%, respectively.
translated by 谷歌翻译
将模型参数适应传入数据流是深度学习可伸缩性的关键因素。有趣的是,在线设置中的先前持续学习策略无意中将其更新的参数锚定在本地参数子空间中,以记住旧任务,否则会偏离子空间并忘记。从这个观察结果,我们在构建多个参数模式和每个模式分配任务之间建立了权衡。模式优化的任务分配(MOTA),我们的贡献适应策略,并行训练多个模式,然后优化每个模式的任务分配。我们从经验上证明了基线连续学习策略以及各种分配变化的改进,即子人群,领域和任务转变。
translated by 谷歌翻译
有效的自定义合并技术可以积极地修剪特征图的尺寸,从而减少用于资源约束计算机视觉应用程序的推理计算和内存足迹,最近已获得了显着的牵引力。但是,先前的合并作品仅提取激活图的局部环境,从而限制了它们的有效性。相比之下,我们提出了一种新型的非本地自我煽动合并方法,该方法可用作标准合并层的液位替换,例如最大/平均池或跨性别卷积。所提出的自我发项模块使用斑块嵌入,多头自我注意力和空间通道恢复,然后进行乙状结肠激活和指数软效果。这种自我注意的机制有效地聚集了在下采样过程中非本地激活斑之间的依赖性。具有各种卷积神经网络(CNN)体系结构的标准对象分类和检测任务的广泛实验证明了我们所提出的机制优于最先进的(SOTA)合并技术。特别是,我们超过了在Imabilenet-V2上不同变体上的现有合并技术的测试准确性,平均平均为1.2%。随着初始层中激活图的激进下采样(可减少记忆消耗的22倍),与具有ISO-MEMORY足迹的SOTA技术相比,我们的方法的测试准确性提高了1.43%。这使我们的模型可以在内存受限的设备中部署,例如微型控制器(不会失去明显的精度),因为初始激活映射会消耗大量的芯片内存储器,用于复杂视觉任务所需的高分辨率图像。我们提出的合并方法还利用了通道修剪的想法,以进一步减少记忆足迹。
translated by 谷歌翻译
神经肌肉疾病,例如脊柱肌肉萎缩(SMA)和Duchenne肌肉营养不良症(DMD),导致6,000名儿童中有1例的渐进性肌肉变性和运动功能丧失。传统的上肢运动功能评估不能定量测量患者的性能,这使得很难跟踪进度的增量变化。评估神经肌肉疾病儿童的运动功能特别具有挑战性,因为他们在实验过程中可能会紧张或兴奋,或者简直太年轻而无法遵循精确的说明。这些挑战转化为混杂因素,例如执行臂卷曲的不同部分较慢或更快(相位变异性),从而影响评估的运动质量。本文使用曲线注册和形状分析来暂时对齐轨迹,同时提取平均参考形状。距这种平均形状的距离用于评估运动质量。所提出的指标是混杂因素(例如相位变异性)的不变性,同时提出了几种临床相关的见解。首先,控制和患者人群的功能分数在统计上存在显着差异(p $ = $ 0.0213 $ \ le $ 0.05)。接下来,患者队列中的几名患者能够与健康队列进行运动,反之亦然。我们的指标是根据可穿戴设备计算的,与Brooke的分数有关((P $ = $ 0.00063 $ \ le $ $ 0.05))以及基于功能测定法的电动机功能评估((P $ = $ = $ 0.0006 $ \ le $ 0.05)) 。这些结果表明了日常生活中无处不在的运动质量评估的希望。
translated by 谷歌翻译